Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 16 de 16
Filtrar
1.
RSC Adv ; 14(19): 13083-13094, 2024 Apr 22.
Artigo em Inglês | MEDLINE | ID: mdl-38655474

RESUMO

The solute carrier transporter family 6 (SLC6) is of key interest for their critical role in the transport of small amino acids or amino acid-like molecules. Their dysfunction is strongly associated with human diseases such as including schizophrenia, depression, and Parkinson's disease. Linking single point mutations to disease may support insights into the structure-function relationship of these transporters. This work aimed to develop a computational model for predicting the potential pathogenic effect of single point mutations in the SLC6 family. Missense mutation data was retrieved from UniProt, LitVar, and ClinVar, covering multiple protein-coding transcripts. As encoding approach, amino acid descriptors were used to calculate the average sequence properties for both original and mutated sequences. In addition to the full-sequence calculation, the sequences were cut into twelve domains. The domains are defined according to the transmembrane domains of the SLC6 transporters to analyse the regions' contributions to the pathogenicity prediction. Subsequently, several classification models, namely Support Vector Machine (SVM), Logistic Regression (LR), Random Forest (RF), and Extreme Gradient Boosting (XGBoost) with the hyperparameters optimized through grid search were built. For estimation of model performance, repeated stratified k-fold cross-validation was used. The accuracy values of the generated models are in the range of 0.72 to 0.80. Analysis of feature importance indicates that mutations in distinct regions of SLC6 transporters are associated with an increased risk for pathogenicity. When applying the model on an independent validation set, the performance in accuracy dropped to averagely 0.6 with high precision but low sensitivity scores.

2.
Mol Inform ; : e202300287, 2024 Jan 30.
Artigo em Inglês | MEDLINE | ID: mdl-38288682

RESUMO

In the past years the interest in Solute Carrier Transporters (SLC) has increased due to their potential as drug targets. At the same time, macrocycles demonstrated promising activities as therapeutic agents. However, the overall macrocycle/SLC-transporter interaction landscape has not been fully revealed yet. In this study, we present a statistical analysis of macrocycles with measured activity against SLC-transporter. Using a data mining pipeline based on KNIME retrieved in total 825 bioactivity data points of macrocycles interacting with SLC-transporter. For further analysis of the SLC inhibitor profiles we developed an interactive KNIME workflow as well as an interactive map of the chemical space coverage utilizing parametric t-SNE models. The parametric t-SNE models provide a good discrimination ability among several corresponding SLC subfamilies' targets. The KNIME workflow, the dataset, and the visualization tool are freely available to the community.

3.
Chem Res Toxicol ; 36(8): 1300-1312, 2023 08 21.
Artigo em Inglês | MEDLINE | ID: mdl-37439496

RESUMO

Each year, publicly available databases are updated with new compounds from different research institutions. Positive experimental outcomes are more likely to be reported; therefore, they account for a considerable fraction of these entries. Established publicly available databases such as ChEMBL allow researchers to use information without constrictions and create predictive tools for a broad spectrum of applications in the field of toxicology. Therefore, we investigated the distribution of positive and nonpositive entries within ChEMBL for a set of off-targets and its impact on the performance of classification models when applied to pharmaceutical industry data sets. Results indicate that models trained on publicly available data tend to overpredict positives, and models based on industry data sets predict negatives more often than those built using publicly available data sets. This is strengthened even further by the visualization of the prediction space for a set of 10,000 compounds, which makes it possible to identify regions in the chemical space where predictions converge. Finally, we highlight the utilization of these models for consensus modeling for potential adverse events prediction.


Assuntos
Aprendizado de Máquina , Bases de Dados Factuais
4.
J Comput Aided Mol Des ; 37(4): 183-200, 2023 04.
Artigo em Inglês | MEDLINE | ID: mdl-36943645

RESUMO

Multi-task learning in deep neural networks has become a topic of growing importance in many research fields, including drug discovery. However, applying multi-task learning poses new challenges in improving prediction performance. This study investigated the potential of training data enrichment to enhance multi-task model prediction quality in drug discovery. The study evaluated four scenarios with varying degrees of information capacity of the training data and applied two types of test data to evaluate prediction performance. We used three datasets: ViralChEMBL, which consisted of binary activities of compounds against viral species, was applied for the classification task; pQSAR(159) and pQSAR(4267), which consisted of bio-activities of compounds and assays from the research of the profile-QSAR method, were applied for regression tasks. We built multi-task models based on the feed-forward DNNs using the PyTorch framework. Our findings showed that training data enrichment could be an effective means of enhancing prediction performance in multi-task learning, but the degree of improvement depends on the quality of the training data. The more unique compounds and targets the training data included, the more new compound-target interactions are required for prediction improvement. Also, we found out that even using multi-task learning, one could not predict the interactions of compounds that are highly dissimilar from those used for model training. The study provides some recommendations for effectively employing multi-task learning in drug discovery to improve prediction accuracy and facilitate the discovery of novel drug candidates.


Assuntos
Descoberta de Drogas , Redes Neurais de Computação , Descoberta de Drogas/métodos
5.
ACS Omega ; 7(11): 9710-9719, 2022 Mar 22.
Artigo em Inglês | MEDLINE | ID: mdl-35350354

RESUMO

Dissociation induced by the accumulation of internal energy via collisions of ions with neutral molecules is one of the most important fragmentation techniques in mass spectrometry (MS), and the identification of small singly charged molecules is based mainly on the consideration of the fragmentation spectrum. Many research studies have been dedicated to the creation of databases of experimentally measured tandem mass spectrometry (MS/MS) spectra (such as MzCloud, Metlin, etc.) and developing software for predicting MS/MS fragments in silico from the molecular structure (such as MetFrag, CFM-ID, CSI:FingerID, etc.). However, the fragmentation mechanisms and pathways are still not fully understood. One of the limiting obstacles is that protomers (positive ions protonated at different sites) produce different fragmentation spectra, and these spectra overlap in the case of the presence of different protomers. Here, we are proposing to use a combination of two powerful approaches: computing fragmentation trees that carry information of all consecutive fragmentations and consideration of the MS/MS data of isotopically labeled compounds. We have created PyFragMS-a web tool consisting of a database of annotated MS/MS spectra of isotopically labeled molecules (after H/D and/or 16O/18O exchange) and a collection of instruments for computing fragmentation trees for an arbitrary molecule. Using PyFragMS, we investigated how the site of protonation influences the fragmentation pathway for small molecules. Also, PyFragMS offers capabilities for performing database search when MS/MS data of the isotopically labeled compounds are taken into account.

6.
ACS Omega ; 6(45): 30743-30751, 2021 Nov 16.
Artigo em Inglês | MEDLINE | ID: mdl-34805702

RESUMO

Humans prefer visual representations for the analysis of large databases. In this work, we suggest a method for the visualization of the chemical reaction space. Our technique uses the t-SNE approach that is parameterized using a deep neural network (parametric t-SNE). We demonstrated that the parametric t-SNE combined with reaction difference fingerprints could provide a tool for the projection of chemical reactions on a low-dimensional manifold for easy exploration of reaction space. We showed that the global reaction landscape projected on a 2D plane corresponds well with the already known reaction types. The application of a pretrained parametric t-SNE model to new reactions allows chemists to study these reactions in a global reaction space. We validated the feasibility of this approach for two commercial drugs, darunavir and montelukast. We believe that our method can help to explore reaction space and will inspire chemists to find new reactions and synthetic ways.

7.
Sci Rep ; 11(1): 14798, 2021 07 20.
Artigo em Inglês | MEDLINE | ID: mdl-34285269

RESUMO

We developed a Transformer-based artificial neural approach to translate between SMILES and IUPAC chemical notations: Struct2IUPAC and IUPAC2Struct. The overall performance level of our model is comparable to the rule-based solutions. We proved that the accuracy and speed of computations as well as the robustness of the model allow to use it in production. Our showcase demonstrates that a neural-based solution can facilitate rapid development keeping the required level of accuracy. We believe that our findings will inspire other developers to reduce development costs by replacing complex rule-based solutions with neural-based ones.

8.
Anal Bioanal Chem ; 412(28): 7767-7776, 2020 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-32860519

RESUMO

Retention time is an important parameter for identification in untargeted LC-MS screening. Precise retention time prediction facilitates the annotation process and is well known for proteomics. However, the lack of available experimental information for a long time has limited the prediction accuracy for small molecules. Recently introduced large databases for small-molecule retention times make possible reliable machine learning-based predictions for the whole diversity of compounds. Applying simple projections may expand these predictions on various LC systems and conditions. In our work, we describe a complex approach to predict retention times for nano-HPLC that includes the consequent deployment of binary and regression gradient boosting models trained on the METLIN small-molecule dataset and simple projection of the results with a small number of easily available compounds onto nano-HPLC separations. The proposed model outperforms previous attempts to use machine learning for predictions with a 46-s mean absolute error. The overall performance after transfer to nano-LC conditions is less than 155 s (10.8%) in terms of the median absolute (relative) error. To illustrate the applicability of the described approach, we successfully managed to eliminate averagely 25 to 42% of false-positives with a filter threshold derived from ROC curves. Thus, the proposed approach should be used in addition to other well-established in silico methods and their integration may broaden the range of correctly identified molecules.

9.
ACS Omega ; 5(25): 15039-15051, 2020 Jun 30.
Artigo em Inglês | MEDLINE | ID: mdl-32632398

RESUMO

Recommender systems (RSs), which underwent rapid development and had an enormous impact on e-commerce, have the potential to become useful tools for drug discovery. In this paper, we applied RS methods for the prediction of the antiviral activity class (active/inactive) for compounds extracted from ChEMBL. Two main RS approaches were applied: collaborative filtering (Surprise implementation) and content-based filtering (sparse-group inductive matrix completion (SGIMC) method). The effectiveness of RS approaches was investigated for prediction of antiviral activity classes ("interactions") for compounds and viruses, for which some of their interactions with other viruses or compounds are known, and for prediction of interaction profiles for new compounds. Both approaches achieved relatively good prediction quality for binary classification of individual interactions and compound profiles, as quantified by cross-validation and external validation receiver operating characteristic (ROC) score >0.9. Thus, even simple recommender systems may serve as an effective tool in antiviral drug discovery.

10.
ACS Omega ; 5(10): 5150-5159, 2020 Mar 17.
Artigo em Inglês | MEDLINE | ID: mdl-32201802

RESUMO

In this work, we present graph-convolutional neural networks for the prediction of binding constants of protein-ligand complexes. We derived the model using multi task learning, where the target variables are the dissociation constant (K d), inhibition constant (K i), and half maximal inhibitory concentration (IC50). Being rigorously trained on the PDBbind dataset, the model achieves the Pearson correlation coefficient of 0.87 and the RMSE value of 1.05 in pK units, outperforming recently developed 3D convolutional neural network model K deep.

11.
Anal Chem ; 91(21): 13465-13474, 2019 11 05.
Artigo em Inglês | MEDLINE | ID: mdl-31490663

RESUMO

We present a novel approach for the increasing reliability of compound identification for LC-MS and MALDI imaging lipidomics. Our approach is based on the characterization of compounds not only by the elution time, accurate mass, and fragmentation spectra but also by the number of labile hydrogens that can be measured using the hydrogen/deuterium (H/D) exchange approach. The number of labile hydrogens (those from -OH and -NH groups) serves as an additional structural descriptor used when performing a database search. For LC-MS experiment, the H/D exchange was performed in the heating capillary of the modified electrospray ionization (ESI) source, while for MALDI imaging, the exchange was performed in the ion funnel at 10 Torr pressure. It was observed that such an approach allowed one to achieve a considerable degree of deuteration, enough to unambiguously distinguish between different classes of lipids. The proposed analytical approach may be successfully used for the identification not only of lipids but also of peptides and metabolites. A special software for the automatic filtration of molecules based on the number of functional groups was also developed.


Assuntos
Cromatografia Líquida/métodos , Deutério/química , Hidrogênio/química , Lipidômica/métodos , Lipídeos/química , Espectrometria de Massas por Ionização e Dessorção a Laser Assistida por Matriz , Animais , Química Encefálica , Metabolismo dos Lipídeos , Macaca mulatta , Masculino , Camundongos , Camundongos Endogâmicos C57BL
12.
Mol Inform ; 38(4): e1800108, 2019 04.
Artigo em Inglês | MEDLINE | ID: mdl-30499195

RESUMO

Despite the increasing volume of available data, the proportion of experimentally measured data remains small compared to the virtual chemical space of possible chemical structures. Therefore, there is a strong interest in simultaneously predicting different ADMET and biological properties of molecules, which are frequently strongly correlated with one another. Such joint data analyses can increase the accuracy of models by exploiting their common representation and identifying common features between individual properties. In this work we review the recent developments in multi-learning approaches as well as cover the freely available tools and packages that can be used to perform such studies.


Assuntos
Química/métodos , Bases de Dados de Compostos Químicos , Informática/métodos , Aprendizado de Máquina , Informática/normas
13.
J Chem Inf Model ; 59(3): 1062-1072, 2019 03 25.
Artigo em Inglês | MEDLINE | ID: mdl-30589269

RESUMO

Acute toxicity is one of the most challenging properties to predict purely with computational methods due to its direct relationship to biological interactions. Moreover, toxicity can be represented by different end points: it can be measured for different species using different types of administration, etc., and it is questionable if the knowledge transfer between end points is possible. We performed a comparative study of prediction multitask toxicity for a broad chemical space using different descriptors and modeling algorithms and applied multitask learning for a large toxicity data set extracted from the Registry of Toxic Effects of Chemical Substances (RTECS). We demonstrated that multitask modeling provides significant improvement over single-output models and other machine learning methods. Our research reveals that multitask learning can be very useful to improve the quality of acute toxicity modeling and raises a discussion about the usage of multitask approaches for regulation purposes. Our MultiTox models are freely available in OCHEM platform ( ochem.eu/multitox ) under CC-BY-NC license.


Assuntos
Aprendizado Profundo , Modelos Teóricos , Testes de Toxicidade Aguda , Animais , Determinação de Ponto Final
14.
RSC Adv ; 9(9): 5151-5157, 2019 Feb 05.
Artigo em Inglês | MEDLINE | ID: mdl-35514634

RESUMO

A parametric t-SNE approach based on deep feed-forward neural networks was applied to the chemical space visualization problem. It is able to retain more information than certain dimensionality reduction techniques used for this purpose (principal component analysis (PCA), multidimensional scaling (MDS)). The applicability of this method to some chemical space navigation tasks (activity cliffs and activity landscapes identification) is discussed. We created a simple web tool to illustrate our work (http://space.syntelly.com).

15.
J Phys Condens Matter ; 30(32): 32LT03, 2018 08 15.
Artigo em Inglês | MEDLINE | ID: mdl-29964270

RESUMO

In this work, we present a new method for predicting complex physical-chemical properties of organic molecules. The approach utilizes 3D convolutional neural network (ActivNet4) that uses solvent spatial distributions around solutes as input. These spatial distributions are obtained by a molecular theory called three-dimensional reference interaction site model. We have shown that the method allows one to achieve a good accuracy of prediction of bioconcentration factor which is difficult to predict by direct application of methods of molecular theory or simulations. Our research demonstrates that combination of molecular theories with modern machine learning approaches can be effectively used for predicting properties that are otherwise inaccessible to purely theory-based models.


Assuntos
Modelos Moleculares , Redes Neurais de Computação , Modelos Lineares , Conformação Molecular , Termodinâmica
16.
J Chem Inf Model ; 58(5): 1083-1093, 2018 05 29.
Artigo em Inglês | MEDLINE | ID: mdl-29689160

RESUMO

Most of the common molecular descriptors have numerous different implementations. This can influence the results of compound prioritization based on the multiparameter assessment (MPA) approach that allows a medicinal chemist to simultaneously analyze and achieve the desired balance of the diverse and often conflicting molecular and pharmacological properties. In this study, we analyzed the feasibility of using different implementations of common descriptors (logP, logS, TPSA, logBB, hERG, nHBA) interchangeably in predesigned sets of requirements in the course of multiparameter compound optimization. The influence of methods of descriptor calculation, continuity or discreteness of their values, their applicability domains, as well as of the nature of desirability functions in an MPA profile were examined in terms of the stability of MPA compound ranking. It was shown that the interchangeable use of different methods of descriptor calculation is reliably acceptable only for continuously distributed parameters transformed by a smooth desirability function. If a descriptor in an MPA scheme is discretely distributed, only the implementation that was used for building the scoring profile may be used for assessment. An inconsistency of assessment due to different applicability domains of descriptors was also demonstrated.


Assuntos
Bases de Dados de Produtos Farmacêuticos , Informática/métodos , Algoritmos , Descoberta de Drogas , Relação Quantitativa Estrutura-Atividade
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA